Decoupled iteration mapping: improving dependency-loop performance on SIMD processors
نویسندگان
چکیده
منابع مشابه
Improving Database Performance on Simultaneous Multithreading Processors
Simultaneous multithreading (SMT) allows multiple threads to supply instructions to the instruction pipeline of a superscalar processor. Because threads share processor resources, an SMT system is inherently different from a multiprocessor system and, therefore, utilizing multiple threads on an SMT processor creates new challenges for database implementers. We investigate three thread-based tec...
متن کاملImproving Search Engines Performance on Multithreading Processors
In this paper we present strategies and experiments that show how to take advantage of the multi-threading parallelism available in Chip Multithreading (CMP) processors in the context of efficient query processing for search engines. We show that scalable performance can be achieved by letting the search engine go synchronous so that batches of queries can be processed concurrently in a simple ...
متن کاملIteration Mapping: Loop Software Pipelining on an XIMD
The multiple instruction streams, low synchronization cost and synchronous nature of the XIMD (variable instruction stream, multiple data stream) architecture create an opportunity for a new architecture-compiler interface. As an extension to the VLIW (Very Long Instruction Word) architecture, the XIMD can exploit all VLIW scheduling techniques but these do not take full advantage of the unique...
متن کاملDecoupled Value Prediction on Trace Processors
Value prediction is a technique that breaks true data dependences by predicting the outcome of an instruction, and executes speculatively its data-dependent instructions based on the predicted outcome. In this paper, we address several implementation issues for value prediction which are important on wide-issue superscalar architectures, and present a value prediction scheme based on the trace ...
متن کاملImproving Memory Performance for Indirect Accesses on SIMD Computers
SIMD machines operate more efficiently on a wider range of problems when they have the ability to access memory with both global and local addresses. Recent work has made possible the use of caches for global addresses. This paper examines techniques for employing caches to improve memory accesses with local addresses. Specifically, we examine the improvement from utilizing a clusterbased indir...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEICE Electronics Express
سال: 2013
ISSN: 1349-2543
DOI: 10.1587/elex.10.20130798